Back to kernel

Pipeline, March/April 1996, vol.7, no.2
Copyright © 1996 Silicon Graphics


Kernel Processes in IRIX 5.3 and IRIX 6.1

This article provides a brief description of some of the most commonly asked
about IRIX kernel processes in IRIX 5.3 and IRIX 6.1: shaked, bdflush,
vfs_sync, pdflush, bpqueue and xfsd. None of these kernel processes have manual
pages, and the information offered in the on-line manuals is limited. All of
these kernel processes deal with freeing up dynamically allocated kernel
memory.

Two other important kernel processes, sched and vhand, will not be discussed in this article because they are well known processes common in Unix environments.

A kernel process is created by the kernel at boot time and always runs in kernel mode. All code for kernel processes resides within the Unix kernel, /unix.

Normally, kernel processes run at a high non-degrading priority, NDPHIMIN, so that they do not get preempted by a normal priority process. Executing at NDPHIMIN priority also allows higher real-time priority (priorities NDPHIMAX through NDPHIMIN -1) processes to exe-cute before the kernel processes. For an explanation of NDPHIMIN and NDPHIMAX refer to the manual page for schedctl(2).

On multi-processor systems, kernel processes are scheduled to execute across all CPUs so that they run concurrently. There are three means of communicating with kernel processes: systune(1M), npri(1) and syssgi(2).

Shaked

Shaked is a new kernel process that was introduced as part of IRIX 6.1. Prior to IRIX 6.1, its function was handled by the kernel process vhand.

Parts of the kernel, such as device drivers and file systems, dynamically allocate kernel memory. Some of this memory is used for I/O buffers. In order to speed up memory allocation and to facilitate the memory collection process, groups of fixed size memory blocks are preallocated and then assigned when a memory allocation request is made.

When the system as a whole goes below a certain amount of allocated memory (defined by the kernel parameter gpgshi, tunable with systune), the kernel takes steps to free (or "shake loose") enough memory so it can continue to function. Shaked frees preallocated memory blocks previously allocated by the kernel. Shaked also performs I/O on the memory associated with I/O buffers so that this memory can be freed.

Bdflush

Bdflush is responsible for flushing delayed write efs(4) I/O buffers from memory to disk. A delayed write buffer is a block of file data that is currently held in memory by the kernel and has not yet been written to disk. The kernel delays the write of the buffer until absolutely necessary to increase I/O performance. If an I/O operation is performed on the delayed write buffer, no disk access is necessary to retrieve the buffer from disk since it is already resident in memory. Whenever possible, a number of buffers are written at one time in a single call to the disk driver (e.g., if the disk blocks are contiguous).

By default, bdflush is awakened every second. Although this is no longer a tunable kernel parameter, it can be changed for a specific instance with the syssgi SGI_BDFLUSHCNT system call. Root can also use npri to change the priority of bdflush (or any other kernel process) until the next reboot. For example, if the process ID of bdflush is 3, then the following command would change the priority of bdflush to 33 until the next reboot.

# npri -h 33 -p 3
Not all I/O buffers are examined to see if they should be flushed every time bdflush is awakened. Instead, the kernel cycles through a list of I/O buffers and examines a portion of the buffers determined by one divided by the value of bdflushr. Delayed write I/O buffers are only flushed if they are older than the number of seconds specified in the systune variable autoup.

Vfs_sync

Vfs_sync is similar to bdflush except that it flushes file system meta data, such as superblocks and dirty virtual file system nodes, to disk. By default, this process is awakened every 30 seconds (defined by vfs_syncr and tunable with systune), or it can be changed for a specific instance using the syssgi SGI_BDFLUSHCNT system call. If necessary, vfs_sync will cycle through all mounted file systems.

Pdflush

Pdflush writes dirty pages that have been memory mapped with mmap(2) to disk. A page becomes dirty when a process writes to it. If there are any dirty memory mapped pages to flush to disk after bdflush has been awakened, pdflush is invoked.

Bpqueue

Currently, bpqueue handles guaranteed-rate I/O requests in an xfs(4) environment. Refer to the grio(5) manual page for more information.

Xfsd

Xfsd processes certain delayed write I/O buffers for xfs files, similar to bdflush for efs files. Specifically, it handles delayed write I/O buffers for data for which no disk space has been reserved. Xfsd allocates space on the disk for the buffer being flushed, and then sends the buffer to the disk. The number of xfsd processes range from 4 to 12 depending on the amount of physical memory in the system.

Conclusion

There are many kernel processes within IRIX 5.3 and 6.1 that work together to keep dynamically allocated kernel memory to a minimum without compromising system throughput. This is achieved by freeing preallocated memory, flushing I/O buffers to disk, and by using multiple kernel processes to do this work. Each kernel process executes at a high non-degrading priority so that if there is work to be done, it will get ample time to execute. Communication with kernel processes is limited to using npri to change priority, syssgi to temporarily change an attribute of a kernel process, and systune to adjust runtime parameters.